AITopics | learned index

Collaborating Authors

learned index

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sig2Model: A Boosting-Driven Model for Updatable Learned Indexes

Heidari, Alireza, Ahmad, Amirhossein, Zhang, Wei, Xiong, Ying

arXiv.org Artificial IntelligenceSep-26-2025

Learned Indexes (LIs) represent a paradigm shift from traditional index structures by employing machine learning models to approximate the cumulative distribution function (CDF) of sorted data. While LIs achieve remarkable efficiency for static datasets, their performance degrades under dynamic updates: maintaining the CDF invariant (sum of F(k) equals 1) requires global model retraining, which blocks queries and limits the queries-per-second (QPS) metric. Current approaches fail to address these retraining costs effectively, rendering them unsuitable for real-world workloads with frequent updates. In this paper, we present Sig2Model, an efficient and adaptive learned index that minimizes retraining cost through three key techniques: (1) a sigmoid boosting approximation technique that dynamically adjusts the index model by approximating update-induced shifts in data distribution with localized sigmoid functions while preserving bounded error guarantees and deferring full retraining; (2) proactive update training via Gaussian mixture models (GMMs) that identifies high-update-probability regions for strategic placeholder allocation to speed up updates; and (3) a neural joint optimization framework that continuously refines both the sigmoid ensemble and GMM parameters via gradient-based learning. We evaluate Sig2Model against state-of-the-art updatable learned indexes on real-world and synthetic workloads, and show that Sig2Model reduces retraining cost by up to 20x, achieves up to 3x higher QPS, and uses up to 1000x less memory.

artificial intelligence, machine learning, sig2model, (17 more...)

arXiv.org Artificial Intelligence

2509.20781

Country: North America (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

LearnedKV: Integrating LSM and Learned Index for Superior Performance on SSD

Wang, Wenlong, Du, David Hung-Chang

arXiv.org Artificial IntelligenceJun-27-2024

In this paper, we introduce LearnedKV, a novel tiered key-value (KV) store that seamlessly integrates a Log-Structured Merge (LSM) tree with a Learned Index. This integration yields superior read and write performance compared to standalone indexing structures on SSDs. Our design capitalizes on the LSM tree's high write/update throughput and the Learned Index's fast read capabilities, enabling each component to leverage its strengths. We analyze the impact of size on LSM tree performance and demonstrate how the tiered Learned Index significantly mitigates the LSM tree's size-related performance degradation, particularly by reducing the intensive I/O operations resulting from re-insertions after Garbage Collection (GC). To maintain rapid read performance for newly inserted keys, we introduce a non-blocking conversion mechanism that efficiently transforms the existing LSM tree into a new Learned Index with minimal overhead during GC. Our experimental results, conducted across diverse workloads, show that LearnedKV outperforms state-of-the-art solutions by up to 1.32x in read requests and 1.31x in write performance.

key-value pair, learned index, opération, (17 more...)

arXiv.org Artificial Intelligence

2406.18892

Country: North America > United States > Minnesota (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Information Technology (0.67)
Water & Waste Management > Solid Waste Management (0.56)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Accelerating String-Key Learned Index Structures via Memoization-based Incremental Training

Kim, Minsu, Hwang, Jinwoo, Heo, Guseul, Cho, Seiyeon, Mahajan, Divya, Park, Jongse

arXiv.org Artificial IntelligenceMar-18-2024

Learned indexes use machine learning models to learn the mappings between keys and their corresponding positions in key-value indexes. These indexes use the mapping information as training data. Learned indexes require frequent retrainings of their models to incorporate the changes introduced by update queries. To efficiently retrain the models, existing learned index systems often harness a linear algebraic QR factorization technique that performs matrix decomposition. This factorization approach processes all key-position pairs during each retraining, resulting in compute operations that grow linearly with the total number of keys and their lengths. Consequently, the retrainings create a severe performance bottleneck, especially for variable-length string keys, while the retrainings are crucial for maintaining high prediction accuracy and in turn, ensuring low query service latency. To address this performance problem, we develop an algorithm-hardware co-designed string-key learned index system, dubbed SIA. In designing SIA, we leverage a unique algorithmic property of the matrix decomposition-based training method. Exploiting the property, we develop a memoization-based incremental training scheme, which only requires computation over updated keys, while decomposition results of non-updated keys from previous computations can be reused. We further enhance SIA to offload a portion of this training process to an FPGA accelerator to not only relieve CPU resources for serving index queries (i.e., inference), but also accelerate the training itself. Our evaluation shows that compared to ALEX, LIPP, and SIndex, a state-of-the-art learned index systems, SIA-accelerated learned indexes offer 2.6x and 3.4x higher throughput on the two real-world benchmark suites, YCSB and Twitter cache trace, respectively.

computing machinery, proceedings, throughput, (13 more...)

arXiv.org Artificial Intelligence

2403.11472

Country:

North America > United States > Oregon > Multnomah County > Portland (0.14)
North America > United States > New York > New York County > New York City (0.06)
North America > United States > Washington > King County > Seattle (0.04)
(14 more...)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Survey of Learned Indexes for the Multi-dimensional Space

Al-Mamun, Abdullah, Wu, Hao, He, Qiyang, Wang, Jianguo, Aref, Walid G.

arXiv.org Artificial IntelligenceMar-11-2024

A recent research trend involves treating database index structures as Machine Learning (ML) models. In this domain, single or multiple ML models are trained to learn the mapping from keys to positions inside a data set. This class of indexes is known as "Learned Indexes." Learned indexes have demonstrated improved search performance and reduced space requirements for one-dimensional data. The concept of one-dimensional learned indexes has naturally been extended to multi-dimensional (e.g., spatial) data, leading to the development of "Learned Multi-dimensional Indexes". This survey focuses on learned multi-dimensional index structures. Specifically, it reviews the current state of this research area, explains the core concepts behind each proposed method, and classifies these methods based on several well-defined criteria. We present a taxonomy that classifies and categorizes each learned multi-dimensional index, and survey the existing literature on learned multi-dimensional indexes according to this taxonomy. Additionally, we present a timeline to illustrate the evolution of research on learned indexes. Finally, we highlight several open challenges and future research directions in this emerging and highly active field.

learned index, multi-dimensional index, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2403.06456

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(6 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.92)
(4 more...)

Add feedback

Learned Sorted Table Search and Static Indexes in Small Model Space

Amato, Domenico, Bosco, Giosuè Lo, Giancarlo, Raffaele

arXiv.org Artificial IntelligenceSep-17-2022

Machine Learning Techniques, properly combined with Data Structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed-up Binary Search, with the use of additional space with respect to the table being searched into. Such space is devoted to the Machine Learning Model. Although in their infancy, they are methodologically and practically important, due to the pervasiveness of Sorted Table Search procedures. In modern applications, model space is a key factor and, in fact, a major open question concerning this area is to assess to what extent one can enjoy the speed-up of Binary Search achieved by Learned Indexes while using constant or nearly constant space models. In this paper, we investigate the mentioned question by (a) introducing two new models, i.e., the Learned k-ary Search Model and the Synoptic Recursive Model Index, respectively; (b) systematically exploring the time-space trade-offs of a hierarchy of existing models, i.e., the ones in the reference software platform Searching on Sorted Data, together with the new ones proposed here. By adhering and extending the current benchmarking methodology, we experimentally show that the Learned k-ary Search Model can speed up Binary Search in constant additional space. Our second model, together with the bi-criteria Piece-wise Geometric Model index, can achieve a speed-up of Binary Search with a model space of 0:05% more than the one taken by the table, being competitive in terms of time-space trade-off with existing proposals. The Synoptic Recursive Model Index and the bi-criteria Piece-wise Geometric Model complement each other quite well across the various levels of the internal memory hierarchy. Finally, our findings stimulate research in this area, since they highlight the need for further studies regarding the time-space relation in Learned Indexes.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2107.0948

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Sicily > Palermo (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Standard Vs Uniform Binary Search and Their Variants in Learned Static Indexing: The Case of the Searching on Sorted Data Benchmarking Software Platform

Amato, Domenico, Bosco, Giosuè Lo, Giancarlo, Raffaele

arXiv.org Artificial IntelligenceJul-8-2022

Learned Indexes are a novel approach to search in a sorted table. A model is used to predict an interval in which to search into and a Binary Search routine is used to finalize the search. They are quite effective. For the final stage, usually, the lower_bound routine of the Standard C++ library is used, although this is more of a natural choice rather than a requirement. However, recent studies, that do not use Machine Learning predictions, indicate that other implementations of Binary Search or variants, namely k-ary Search, are better suited to take advantage of the features offered by modern computer architectures. With the use of the Searching on Sorted Sets SOSD Learned Indexing benchmarking software, we investigate how to choose a Search routine for the final stage of searching in a Learned Index. Our results provide indications that better choices than the lower_bound routine can be made. We also highlight how such a choice may be dependent on the computer architecture that is to be used. Overall, our findings provide new and much-needed guidelines for the selection of the Search routine within the Learned Indexing framework.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1002/spe.3150

2201.01554

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Sicily > Palermo (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Software (1.00)
Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
(2 more...)

Add feedback

The Case for B-Tree Index Structures

#artificialintelligenceJan-15-2018, 14:31:19 GMT

Recently a very interesting paper made a Case for Learned Index Structures. It argued that we could, and perhaps should, replace traditional index structures with machine learning, using the following reasoning: If we consider the leaf pages of an index as a sorted array, the inner pages of the index point towards a (bucketized) position within that array. Which means that it essentially describes the cummulative distribution function (CDF), mapping from keys to array positions. And the argument of that paper was that using machine learning we can do that mapping much better because a) the learned model (in this case neuronal network) is much smaller than a traditional b-tree, and b) the learned model can predict the CDF value much more accurately than a simple b-tree, which improves performance. Now I am all in favor of trying out new ideas, and adapting to the data distribution is clearly a good idea, but do we really need a neural network for that?

artificial intelligence, learned index, machine learning, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback